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Abstract. The standard Bayesian Information Criterion (BIC) is derived un- 
der regularity conditions which are not always satisfied by the graphical models 
with hidden variables. In this paper we derive the BIC score for Bayesian net- 
works in the case of binary data and when the underlying graph is a rooted tree 
and all the inner nodes represent hidden variables. This provides a direct gen- 
eralization of a similar formula given by Rusakov and Geiger for naive Bayes 
models. The main tool used in this paper is a connection between asymptotic 
approximation of Laplace integrals and the real log-canonical threshold. 



1. Introduction 

A key step in Bayesian approach to learning graphical models is to compute 
the marginal likelihood of the data, i.e. the observed likelihood function averaged 
over the parameters with respect to the prior distribution. Given a fully observed 
system the theory of graphical models provides a simple way to obtain the marginal 
likelihood (see e.g. [5], [8]). However, when some of the variables in the system are 
hidden (i.e. never observed), the exact determination of the marginal likelihood is 
typically intractable (e.g. [4], [■.]). Therefore, there is a need to develop efficient 
approximate techniques for computing the marginal likelihood. 

In this paper we focus on large sample approximations for the marginal likelihood 
called the BIC approximation. Let M. be a parametric discrete model and X^ N ^ = 
X 1 ,.--)^^ be a random sample from M. of size N. By Z(N) we denote the 
marginal likelihood and by L(9;X^ N \M) = F{X^ \M, 9) the observed likelihood 
function. Thus 

(1) Z{N) = ¥{X^\M) = ( L(e;X < ^ N \M)ip(e)d9 1 

Je 

where 9 denotes the model parameters, 9 C M. d is the parameter space, and ip(9) 
is a prior distribution on given model M. 

In statistical theory to obtain the BIC approximation we usually require that 
the observed likelihood is maximized over a unique point in the interior of the 
parameter space. For the class of problems for which this assumption is satisfied 
Schwarz [18] showed that as N — » oo 

(2) logZ(N)=l N -~\ogN + 0(l), 

where £n is the maximum value of the log-likelihood and d = dimO. The same 
approximation works if the observed likelihood is maximized over a finite number 
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Figure 1. The case when the observed likelihood is maximized 
over a finite number of points. 

of points. Geometrically, for large sample sizes function Z(N) concentrates around 
the maxima (see Figure 1). This enables us to apply the Laplace approximation 
locally in the neighborhood of each maximum. 

It can be proved (see Proposition 2.7) that the above formula can be generalized 
for the case when the set over which the likelihood is maximized forms a sufficiently 
regular compact subset of the ambient space (see Figure 2). We denote this subset 
by G. In this case as N — » oo 

(3) i ogZ (N)=£ N -^-logN + 0(l), 

where d! = dimO. Note that in our case is a set of zeros of a real analytic 
function. Therefore it will be always a semi-analytic set, i.e. given by {gi{@) > 
0, . . . , g r (6) > 0}, where gi are all analytic functions. It follows that the dimension 
is well defined (see [", Remark 2.12]). 

In the case of models with hidden variables for some data sets the locus of the 
points maximizing the likelihood may not be sufficiently regular. In this case the 
likelihood will have a different asymptotic behavior around the singular points and 
relatively more mass of the marginal likelihood integral will be related to neigh- 
borhoods of singular points (see Figure 3). For these points we cannot use the 
Laplace approximation. Nevertheless the computation of the BIC approximation is 
still possible by using the results of Watanabe [21] and linking this to some earlier 
works of Arnold, Varchenko and collaborators (see e.g. [1]). This approximation 
will differ from the standard BIC formula. First, the coefficient of logiV can be 
different from — 2 • Second, we sometimes obtain an additional log log N term 
affecting the asymptotics (see Theorem 2.3). 

In this paper we consider an important model class with large number of hidden 
variables called the general Markov model. This model class is extensively used 
in phylogenetics (e.q. [19, Chapter 8]) and in casuality analysis (e.g. [15]). The 
general Markov model is a Bayesian network on a tree. Thus let T = (V, E) be 
a tree with the vertex set V and the edge set E. Let T r denote a tree rooted in 
r, i.e. a tree with one distinguished vertex r and all the edges directed away from 
r. Let Y = (Y v ) v€ y be a collection of binary random variables indexed by the 
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Figure 2. The case when the observed likelihood is maximized 
over an infinite but smooth subset given by xy = 1 for x,y € 

[-1,1]- 
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Figure 3. The case when the observed likelihood is maximized 
over a singular subset given by xy = for x, y e [—1,1]. 

set of vertices of T. We assume that all the inner nodes represent hidden random 
variables. Hence the general Markov model, denoted by Mt, is a family of marginal 
distributions over the vector of random variables representing the leaves of T. 

A surprising fact proved in this paper is that, given the sample proportions lie 
within the model class, the zeros in the sample covariance matrix of the vector 
of observed random variables completely determine the asymptotics for the model 
M.t- In this paper following [Hi] we always assume: 

(Al): The prior distribution ip : Q — > K is strictly positive, bounded and 
smooth on 9. 

(A2): There exists A such that p (7V ) = p e M for all N > N and p has 
positive entries. 

For a given sample covariance matrix S = [fiij] let 1% denote the number of inner 
nodes v of T such that for each triple i, j, k of leaves separated in T by v we have 
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fj-ijfj-ikfj-jk = but there exist leaves i,j separated by v such that flij ^ 0. Here we 
say that two nodes u, v of T are separated by another node w if w lies on the unique 
path between u and v. We define a degenerate node as an inner nodes v such that 
for any two leaves i,j separated by v we have faj — 0. All other nodes are called 
nondegenerate. We denote by n e the number of edges of T and by n v the number 
of its nodes. 

Theorem 1.1. Let T r be a rooted tree with n leaves representing binary random 
variables X\ , . . . , X n and assume that their joint distribution lies in the general 
Markov model Mt- Let X^ N ^ be N independent realizations of this vector and let 
p( N ^ the corresponding sample proportions. With assumptions (Al) and (A2), if 
there are no degenerate nodes then as N — > oo 

log Z(N) = l N n " + " 2 e ~ 2?2 log N + 0(1), 

where is the maximum log-likelihood value. 

In general if there are degenerate nodes the computations of the BIC approxima- 
tion are much harder because the likelihood in this case maximizes over a singular 
subset of the parameter space. In this paper we obtain a closed form formula for 
the BIC approximation in the case of trivalent trees, i.e. the trees such that each 
inner node has valency three. This is provided in Theorem 1.2 which together with 
Theorem 1.1 are the main results of this paper. 

Let l 3 denote the number of inner nodes v such that for every i,j £ [n] such that 
the path between i and j crosses v we have that fcj ^ 0. 

Theorem 1.2. Let T = (V,E) be a rooted trivalent tree with n > 3 leaves and root 
r. Let X = (Xi, . . . ,X n ) be a binary random vector representing the leaves of T 
and assume that the joint distribution of X lies in Mt- Let X^ N > be a random 
sample given by N independent realization of X and p( N ' the corresponding sample 
proportions. With assumptions (Al) and (A2) ifr is degenerate but all its neighbors 
are not, then as N — > oo 

log Z(N) = ! N - 3n + ^ + 5?3 ~ 1 logiV + 0(1). 
In all other cases as N — > oo 

logZ(JV) =i N - 3n + ^ + 5/ 3 log jv + clo g log jv + 0(l), 

where c > 0. Moreover c = always if r is nondegenerate or if r and all its 
neighbors are degenerate. 

Following [16] the main method of proof is to change the coordinates of the 
models so that the induced parameterization becomes simple. This gives us a much 
better insight into the model structure (see [25], [24]). Since the BIC approximation 
is invariant with respect to these changes the reparameterized problem still gives 
the solution to the original question. The main analytical tool is the real log- 
canonical threshold (e.g. [17], [21]). This is an important geometric invariant 
which in certain cases can be computed in a relatively simple way using discrete 
geometry. The relevance of this invariant to the BIC approximation is given by 
Theorem 2.3. Techniques developed in this paper can be applied to obtain the BIC 
approximation also in the non-trivalent case. 
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The paper is organized as follows. In Section 2 we provide the theory of as- 
ymptotic approximation of the marginal likelihood integrals. This theory allows us 
to approximate marginal likelihood without the standard regularity assumptions. 
Theorem 2.3 links these concepts with the real log-canonical threshold which allows 
us to use simple algebraic arguments. In Section 3 we define Bayesian networks on 
rooted trees. We also obtain some simple result on the BIC approximation in the 
case when the observed likelihood is maximised over a sufficiently smooth subset 
of the parameter space. This gives a simple proof of Theorem 1.1. The proof of 
Theorem 1.2 is more technical and so divided it into three main steps. By Theo- 
rem 2.3 to obtain the asymptotic approximation we need to compute a certain real 
log-canonical threshold. In the first step, in Section 4, following [11] we introduce 
the concept of the real log-canonical threshold of an ideal. Theorem 4.2 reduces 
our computations to the real log-canonical threshold of an ideal induced by the 
parametrization of the given model. This result applies for general discrete sta- 
tistical models. Theorem 4.6 gives an additional reduction which can be obtained 
only for tree models. Second step is given in Section 5 where we show that the 
computations can be reduced to two distinct cases. One of them is the case already 
considered in Section 3. The second case is more complicated and requires to use 
the method of Newton diagrams. We analyze this case in Section 6. Finally in 
Section 7 we combine all the results. 



In this section we introduce the real log-canonical threshold and link it with 
the problem of asymptotic approximation of Laplace integrals. We present how 
this enables us to obtain the BIC approximation in the case of a general class of 
statistical models. 

2.1. The real log-canonical threshold. Given O € K d , let Ae {R d ) be the 
ring of real- valued functions / : R d — > R that are analytic at 8 . Given a subset 
C M d , let Ae(^- d ) be the ring of real functions analytic at each point 8q E 0. 
If / € Ae(M. d ), then for every Qq g 0, / can be locally represented as a power 
series centered at 9q. Denote by A^iW 1 ) the subset of _4e(IR d ) consisting of all 
non-negative functions. Usually the ambient space is clear from the context and in 
this case we omit it in our notation writing Ag and so on. We assume that O C M. d 
is a compact and semianalytic set of dimension d, i.e. = {x £ M. d : gi(x) > 
0, . . . ,gi(x) > 0}, where gi are analytic functions. 

Definition 2.1 (The real log-canonical threshold). Given a compact semianalytic 
set C R d such that dim© = d, a real analytic function / € and a smooth 

positive function ip : M. d — > M., consider the zeta function defined as 



This function is extended to a meromorphic function in z on the entire complex 
line (c.f. Theorem 2.4 in [21]). The real log-canonical threshold of f denoted by 
rlcte(/;(/j) is the smallest pole of C( z )- By multe(/; <p) we denote the multiplic- 
ity of this pole. By convention if £(z) has no poles then rlcte (/;<£>) = oo and 
multe(/; <p) = d. If <p(6) = 1 then we omit tp in the notation writing rclte(/) and 
mult e (/)- Define RLCTe (/;</?) to be the pair (rlcte(/; <p), multe(/; (p)), and we 
order these pairs so that (r%, mi) > (r 2 , m 2 ) if r x > r 2 , or r\ — r 2 and mj < m 2 . 



2. ASYMPTOTICS OF MARGINAL LIKELIHOOD INTEGRALS 
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To show that the real log-canonical threshold is well defined we need to show 
that if C(z) has poles then the minimal pole always exists. This is easy to see if / 
and if are monomial functions as in the example below. 

Example 2.2. Let / : R d -> K such that f(x) = x 2u = x\ Ul ■ ■ ■ x 2 d Ud and ip(x) = 
x h = x^ 1 ■ ■ ■ x^ d where u,h <E N d . If = [— e, e] d is an e-box around the origin in 
M. d then the zeta function in (4) becomes 

JU i=l i=l 1 1 

where C(e) is a constant depending on e. Hence the poles of £(z) are positive 
rational functions given by ^±Jk for i = 1, . . . , d. In this case the smallest pole is 
given by the minimal of these numbers and the multiplicity is given by the number 
of times the minimum occurred. 

The computation of poles and their multiplicities of £(z) is linked to the asymp- 
totic expansion of the Laplace integral 

(5) I{N) = f e- Nm <p(6)M, 

Je 

for large values of the parameter N. This theory was independently developed in 
Section 7.2 in [1] and Section 2.4 and Section 6.2 in [21]. The following theorem 
gives this relation. In Section 2.2 we show how it can be used to obtain the BIC ap- 
proximation under a fairly general statistical setting which will be later specialized 
to general Markov models for binary data. 

Theorem 2.3. Let be a compact semianalytic subset of M. d and f g „4^(]R d ). 
Let L(N) be defined as in (5). Then as N —¥ oo 

log/(iV) = -rlcte(/;^)logiV+(multe(/;^)-l)loglogiV + 0(l). 

Proof. This is a special case of Theorem 4.2 in [1 1 ] such that r = 1 and f\ = \f] . □ 

To compute the real log-canonical threshold we split integral in (4) into a sum 
of finitely many integrals over small neighbourhoods 8o of some points 6*o € 0. We 
can always do this using a partition of unity since is compact (see e.g. §16, [ ]). 
For each of the local integrals we use Hironaka's theorem stated below to reduce it 
to a locally monomial case which can be easily dealt with as in Example 2.2. The 
version of Hironaka's theorem we are going to use in this paper was first formulated 
in [2]. 

Theorem 2.4 (Hironaka's theorem). Let f :R d — >• M be a real analytic function in 
the neigborhood of the origin such that /(0) = 0. Then there exists a neighborhood of 
the origin W and a proper real analytic map tt : U —¥ W where U is a d-dimensional 
real analytic manifold such that 

(1) The map tt is an isomorphism between U \Uq and W \ Wo, where Wq = 
{x£W : f(x) = 0} and U = {u E U : f{n(u)) = 0}. 

(2) For an arbitrary point P € Uq, there is a local coordinate system u — 
(ui, . . . , Ud) of U in which P is the origin and 



f(n(u)) = a(w)-Ui 1 • • -U 
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where a(u) is a nowhere vanishing function on this local chart and ri, . . . , rd 
are nonnegative integers, and the Jacobian determinant of x = ir(u) is 

n'(u) = 6(w)wJ' 1 • • • u h d d , 

where again b(u) ^ and hi, . . . ,hd are nonnegative integers. 
Moreover ir can be always obtained as a composition of blow-ups along smooth 
centers. 

For the construction of the blow-up see for example Section 3.5 in [21]. 

The local computations are performed as follows. Let 9o € and let Wo be 
any sufficiently small open ball around 9q in M. d . Then, by Theorem 2.4 in [21], 
RLCTw,, (/; tp) does not depend on the choice of W and hence it is denoted by 
RLCTe (/; tp). Formally for this local computation we consider / centered at 9o, 
i.e. the function f(9 + 9 ). If f(9 ) ^ then RLCTvi/o (/; <fi) = (oo,d) and hence 
we can constrain only to points 9$ such that /(#o) = 0. In this case by Hironaka's 
theorem 

{f{9))-^(9)A9= f (f(9 + 9 ))-^(9 + 9 )d9 = V f u h - 2rz c (u)du, 

W JW a JUf, 

where W is the neighbourhood Wq translated to the origin and the (finite) sum 
is over all local charts as in the theorem such that they cover ir~ 1 (W) and cp are 
nowhere vanishing functions on Up . Then for each of the charts we do computations 
as in Example 2.2. Consequently rlcte (/; <p) = min^ mim 1 ^ r , ' i and 

mult 9o (/; tp) = max#{i € {1, . . . ,d} : 1 j~ ^ = rlcte (/; tp)}. 
p ^r^ 

In particular rltce(/;y) is always a positive rational number and multe(/;y) is 
a nonnegative integer which shows that Definition 2.1 makes sense. Moreover by 
Theorem 2.4 in [21] the real log-canonical threshold does not depend on the triple 
(W,U,ir). 

The local computations give the answer to the global question since by [11, 
Proposition 2.5] the set of pairs RLCTe (/; f) for 9q € & has a minimum and 

(6) RLCT e (/; tp) = min RLCT 0O (/; p), 

wo SO 

where O = Wq n 0. For each 9 G 6 to compute RLCTe (/; tp) we consider 
two cases. If #o lies in the interior of then we can assume ©o = Wo and hence 
RLCT 0O (/;^) = RLCT 9o (/;^). If 6 G bd(0), where bd(0) denotes the bound- 
ary of 0, the computations may change significantly because the real log-canonical 
threshold depends on the boundary conditions (c.f. Example 2.7 in [11]). Never- 
theless it can be showed that at least if there exists an open subset U C K d such 
that U D and / e Af}(R d ) then 

(7) RLCT 0o (/)>RLCT 9o (/). 
For in this case 

/ (f(6))-*d6= f {f{9))- z d9+ f (f(9))- z d9 
JWo Je Jw \e 

which implies that 

RLCT 9n (/) = min{RLCT eo (/),RLCT Wo \e (/)}. 
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If / € -4|(M d ), then let 6 := / _1 (0). By definition 2.1, RLCTe (/) = (oo,d) for 
all 9q ^ Q and hence we can restrict ourselves to points in 0. Therefore, whenever 
8 ^ we have 

(8) RLCTq (/) - min RLCT eo (/). 

o e§ 

Remark 2.5. Note that there is a substantial difference between the real log- 
canonical threshold and the log-canonical threshold which is an important invariant 
used in algebraic geometry (see e.g. [10, Section 9.3.B]). Let / 6 R[&i, . . . , xj\ 
be a polynomial with real coefficients. By fc we denote its complexification, i.e. 
the same polynomial but as an element of C[xi, . . . , xj. Saito [17] showed that 
rkt(/) > lct(/c). As an example let f(x, y, z) = x 2 + y 2 + z 2 . By Kollar [9, Exam- 
ple 8.15] we have lcto(/c) = 1 and we can easily show that over the real numbers 
a single blow-up at the origin (see e.g. [21, Section 3.5]) allows us to compute the 
poles of C(z) (c.f. Proposition 3.3 in [17]) giving rlct (/) = 3/2. 

2.2. The marginal likelihood. Let X be a discrete random variable with values 
in [to] for some m > 1. A distribution of X is given by (p(X = 1), . . . ,p(X = to)). 
Denoting p(X = i) by pi we associate each probability distribution for X with a 
point p — (pi, . . . ,p m ) in the probability simplex 

m 

A m _! = {x e R m : Xi > 0,^x 4 = 1}. 

i=i 

By definition a model for X is a family of points in A m _i. The model analysed 
in this paper is a special case of a parametric algebraic statistical models defined 
as an image in A m _i of a polynomial mapping p : Q — > A TO _i, where O C K d is 
called the parameter space (see e.g. Chapter 1, [14]). We define M. — p(Q). Note 
that for a given integer N every point q € A TO _j gives a multinomial distribution 
Mult(iV, q). Hence given a fixed N we can naturally associate A m _i with the 
multinomial model and hence M. can be treated as a submodel of the multinomial 
model. 

Let X( N ^ — {X 1 , . . . , X N ) denote N independent observations of X and let (Ni) 
for i £ [to] be the sufficient statistic given by the sample counts. Let p^ = \p\ ] 

denote the sample proportions p\ N ^ — Ni/N. Given that the observations in X^ N ^ 
are independent we can write the logarithm of the marginal likelihood Z(N) as 
a function of p( N \ Let £(p(9);X) = logL(9;X) be the log-likelihood for a single 
observation. Then the observed log-likelihood of the data can be rewritten as 

(9) iN(p(6))= E N a \og Pa (9) = N£(p(9);p (N ^). 

ae{0,l}" 

If the sample proportions lie in the interior of the probability simplex then 
the likelihood function for the multinomial model as a function of the probabilities 
is always maximized over p( N \ Hence if € M the likelihood function con- 
strained to M. is also maximized at p. It follows that with assumption (A2) the 
maximum likelihood estimates for sufficiently large N are given as all the points in 
the parameter space mapping to p which we denote by 9 =p _1 (p) C 6. 
For given p define the normalized log-likelihood as a function / : 9 — » K 



(10) 



f(9) = f(p(9);p) = £{p;p) - £(p(6);p) > 0. 
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Then Z(N) in (1) can be rewritten as exp(^Tv) ■ I(N), where 

(11) I(N)= [ cxp {-Nf (6))} v(0)dO. 

Je 

The logarithm of the marginal likelihood can be written as log Z(N) = £n + 
log J(iV), where £n — £n(p)- By construction / 6 and / _1 (0) = p^ 1 {p) = 0. 
By Theorem 2.3 to obtain the asymptotic approximation for log /(TV), and hence 
also for log Z(N), we need to compute RLCTe(/; f). 

Remark 2.6. Note that we are interested in the approximation of the observed 
marginal likelihood not in the expected one. Therefore, we cannot immediately 
apply the main result of Watanabe as stated in Theorem 1.1 in [11] (for a discussion 

see [22]). 

In our analysis of general Markov models we distinguish two cases: the smooth 
case when there exists a smooth manifold M such that 6 = Mn8 and the singular 
case. The smooth case is simple to deal with. We can use the real log-canonical 
threshold to show that the BIC approximation in (2) generalizes to the case when 
is a sufficiently regular subset of 9 given in (3). We make this precise in the 
following proposition. 

Proposition 2.7. Let f e 1 /4^(R d ) be the normalized log-likelihood in (10). Given 
(Al) and (A 2) assume that p = p( N ' 6 M is such that there exists a smooth 
manifold M C R d satisfying = M H 6. Then as N -> oo 

lo g Z(N)=i N - d ^logN + 0(l) ) 

where d! = dim O . 

Proof. By assumption (Al) there exist two constants c, C > such that c < <p(0) < 
C on 6. Therefore 

c / {f{6))- z A6 < ((z) < C [ (f{6))- z d6 
Je Je 

and it follows that RLCT e (/; if) = RLCT e (/). By Theorem 2.3 it suffices to prove 

the following lemma which generalises Proposition 3.3 in [17]. 

Lemma 2.8. Let 6 C M. d be a compact semianalytic set and f G ^^(M''). If 
there exists a smooth manifold M C K d such that = M D and 9$ £ then 
RLCT eo (/) = RLCT e (/) = (^, 1) where d' = dim©. 

To prove this recall that the real log-canonical threshold RLCTe (/) does not 
depend on the choice of a neighborhood Wo of 0$. Since = M PI and M 
is a smooth manifold it follows that for each point #o of there exists an open 
neighborhood Wq of 8q in R d with local coordinates w%, . . . , wa centered at 9q 
such that the local equation of X is w\ + ■ ■ ■ + w 2 c =0, where c = d — 6! . A 
single blow-up 7r at the origin satisfies all the conditions of Hironaka's Theorem 
since in the new coordinates over one of the charts /(7r(u)) = u\a(u) where a(u) is 
nowhere vanishing and ir'(u) = itj~ . For other charts the situation is the same and 
hence RLCT eo (/) = (c/2, 1). Since by (6) RLCT e (/) = min^e RLCT Wone (/) it 
suffices to show that if 9q is a boundary point of then RLCTwoneCJ ) > (c/2, 1). 
But this follows from (7) and the fact that RLCTg (/) = (c/2, 1) as 9q is a smooth 
point of M. The lemma is hence proved. □ 
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3. General Markov models 

In this section wc formally define the general Markov model and give the asymp- 
totic approximation for the marginal likelihood in the smooth case which is given 
by Theorem 1.1. 

3.1. Definition of the model class. All random variables considered in this 
paper are assumed to be binary with values in either or 1. Let T r = (V,E) be 
a rooted tree. Recall that n e — \E\ and n v — \V\. For any e = (k,l) € E we say 
that k and I are adjacent and A: is a parent of I and we denote it by k — pa(7). 
For every (5 € {0,1}^ let pp — W(f) veV {Y v = /3 V }). A Markov process on a 
rooted tree T r is a sequence Y — (Y v ) ve v of random variables such that for each 

/} = (AWe{0,if 

v£V\r 

where 4l|^ a( „, = P ( F - = = /W)) and d S = V ( Yr = In a morc 

standard statistical language these models are just fully observed Bayesian networks 
on rooted trees. Since 9^ + 9^j = 1 for all v e V and i = 0, 1 then the Markov 
process on T r defined by Equation (12) has exactly 2n e + 1 free parameters in the 
vector 9: one for the root distribution 6^ and two for each edge (u,v) € E given 
by 9^q and 0^ and the vector of all parameters is denoted by 9. The parameter 
space is ®t = [0, l] 2n « +1 . Henceforth we usually omit the root r in the notation 
writing T to denote the rooted tree T r . 

The general Markov model on T is induced from the Markov process on T by 
assuming that all the inner nodes represent hidden random variables. Hence we 
consider induced marginal probability distributions over the leaves of T. The set of 
leaves is denoted by L. We assume that T has n leaves and hence we can associate 
L with [n] with some arbitrary numbering of the leaves. Let Y = (X, H) where 
X = (Xi, . . . ,X n ) denotes the variables represented by the leaves of T and H 
denotes the vector of variables represented by inner nodes, i.e. X = (Y v ) ve L and 
H = (Y v ) ve v\L- We define the general Markov model M.t to be the model in the 
probability simplex A2»»-i obtained by summing out in (12) all possible values of 
the inner nodes. By definition Mt is the image of the map p : 0t — > ^2 n -i given 

by 

(13) Pa(*) = X>A? II 4l/W) fOTan y«e{0,l} L , 

H veV\r 

where % is the set of all vectors /3 — (f3 v ) v( zy such that (p v )v£L = ct- For a more 
detailed treatment see Chapter 8 in [19]. 

3.2. The smooth case. For p g A4t let E = [/ty] € W ixn be the covariance 
matrix of the random vector represented by the leaves of T. In [25] we show that 
the geometry of the p-fiber Qt is determined by zeros in S. We say that that an 
edge e € E is isolated relative to p if (Xij = for all i,j € [n] such that e G E(ij), 
where E(ij) denotes the set of edges in the path joining i and j. By E C E we 
denote the set of all edges of T which are isolated relative to p. By T = (V, E \ E) 
we denote the forest obtained from T by removing edges in E. 
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We now define relations on E and E\E. For two edges e, e' with either {e, e'} C E 
or {e, e'} C E\E write e ~ e' if either e = e' or e and e' are adjacent and all the 
edges that are incident with both e and e' are isolated relative to p. Let us now take 
the transitive closure of ~ restricted to pairs of edges in E to form an equivalence 
relation on E. Similarly, take the transitive closure of ~ restricted to the pairs of 
edges in E \ E to form an equivalence relation in E\E. We will let [E] and \E \ E] 
denote the set of equivalence classes of E and E\ E respectively. 

By the construction all the inner nodes of T have either degree zero in T or the 
degree is strictly greater than one. We say that a node v € V is non- degenerate with 
respect to p if either v is a leaf of T or deg v > 2 in T. Otherwise we say that the 
node is degenerate with respect to p. Note that this coincides with the definition 
of a degenerate node given in the introduction. The set of all nodes which are 
degenerate with respect to p is denoted by V. 

Proposition 3.1 ([25], Theorem 5.4). Let T be a tree with n leaves. Let p £ A4t 
and let T be defined as above. If each of the inner nodes of T has degree at least 
two in T then Qt is a manifold with corners and dim©^ = 2^2, where I2 is the 
number of nodes which have degree two in T. 

For the case covered by Proposition 3.1 we obtain a way to compute the asymp- 
totic approximation for the marginal likelihood. 

Proposition 3.2. Let p G Mt be such that each inner node of T has degree at 
least two in T and let f be the normalized likelihood defined by (10). Then 

RLCT eT (f)=( nv+n ;- 2h ,l 

Proof. Since every inner node of T has degree at least two in T then by Proposition 
3.1 there exists a smooth manifold M C K n «+"= SU ch that 6t = M n Qt and 
dimG = 2?2- The result follows from Proposition 2.7 and the fact that dim0T = 
n v + n e . □ 



By Theorem 2.3, Proposition 3.2 implies Theorem 1.1 since h in its statement 
is exactly the number of inner nodes v such that the degree of v in T is two. 

Remark 3.3. Theorem 1.1 is still true if (Al) is replaced by the assumption that 
the prior distribution is bounded on 0^ and there exists an open subset of 0p with 
a non-empty intersection with Qt where the prior is strictly positive. In particular 
we can use conjugate Beta priors ~ Beta(c4 , ) as long as , /?■ > 1. 



4. The ideal-theoretic approach 

In this section we define the real log-canonical threshold of an ideal. Theorem 4.2 
translated the problem of finding the real log-canonical threshold of the normalized 
log-likelihood into algebra. We then analyse the case of general Markov models. In 
Theorem 4.6 we apply a useful change of coordinates which enables us to deal with 
the singular case in a more efficient. 
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4.1. The real log-canonical threshold of an ideal. Let fx, . . . , f r € Aq then 
the ideal generated by fx , . . . , f r is denoted by 

r 

</i, • • • , fr) = {/ € ^le : f{9) = hi(d)fi(e),hi e A }. 

»=i 

Following [11] we generalize the notion of the real log-canonical thresholds to the 
ideal / = (fx, ■ ■ ■ , f r ). This mirrors the analytic definition of the log-canonical 
threshold of an ideal (see e.g. [10, Section 9.3.D]). By definition 

(14) RLCT (J; ip) = RLCT e ((/i, . . . , f r )\<p) := RLCT e (/; <p), 

where f(8) — fl(9) + • • • + f^{0). By Proposition 4.5 in [11] the real log-canonical 
threshold does not depend on the choice of generators. We say that I = (fx, - ■ - ,fr) 
is K-nondegenerate if / = ff is K-nondegenerate as given by Definition 6.5. 

The following important proposition enables us to use the full power of the ideal- 
theoretic approach. 

Proposition 4.1. Let f,g£ ^le(Il^ d ) an d let I be an ideal in Ae(M. d ). Then 

i: Let p : 51 — > be a proper real analytic isomorphism. Denote p* I — {fop- 
f G /} be the pullback of I on An ■ Then, 

RLCT e (I; ip) = RLCTo(p*I; (<p o p)\p'\), 

where \p'\ denotes the Jacobian of p. 
ii: If ip is positive and bounded on then 

RLCT e (/;</?) = RLCTe(I). 

iii: If there exist constants c, d > such that cg(9) < f(8) < c'g(8) for every 
deO then RLCT (/) = RLCTe(s). 

iv: Let I = (fx, f r ) and J = (g x , - - - , g r ) where g { = Uif fori = l,...,r 
and there exist positive constants c, C such that c < Ui(6) < C for all 6 G 
and for all i = 1, . . . , r. Then RLCT e (I) = RLCT e (J). 

Proof. The first result is a direct consequence of Proposition 4.7, [ ]. The second 
and the third follow easily from the interpretation of the real log-canonical threshold 
and its multiplicity as coefficients in the asymptotic approximation of I(N) in (5). 
The last statement follows from the third. □ 

In the statistical context given in Section 2.2, expressing this problem in the 
language of ideals simplifies reductions. 

Theorem 4.2. Let p = (px, ■ ■ ■ ,p m ) '■ 9 — * Am-l be a polynomial mapping and 
M. = p(Q) be the statistical model of X G [m] . For a given p G M. define 

(15) I=(Pi{0)-Px,---,P m (0)-p m ) CA & . 

Let f denote the normalized likelihood defined by (10) and if the prior distribution 
on satisfying (Al). Then we have that 

(16) RLCT e (/;</?) = RLCT (T;^) = RLCT e (J). 

Proof. The first equation follows essentially the same lines as the proof of Theorem 
1.2 in [11]. The second equation follows from Proposition 4.1 ii. □ 
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4.2. A reparameterization of the model. To realize how the formulation in 
terms of the ideals may be useful we first need to introduce a new generating set for 
the ideal I in (15). By Proposition 4.5 in [11] this does not affect our computations. 
Then we take a pullback of I under a polynomial isomorphism. We note that this 
is the algebraized version of the analytic reductions applied in [16]. 

Following [25] we perform a change of coordinates on the model space and pa- 
rameter space. Let T be a tree with n leaves. In this case the change of the 
generating set for T is induced by the following series of transformations. First 
we express the raw probabilities p a for a € {0, l} n in terms of a new system of 
variables given by the non-central moments X a — EX a . This change is a simple 
linear map f p \ : M. 2 — > M 2 with the determinant equal to one. Thus A = f P \(p) 
is defined as follows 

(17) X a = PP for any a G {0, 1}" , 

a</3<l 

where 1 denotes here the vector of ones and the sum is over all binary vectors /? 
such that a < (3 < 1 in the sense that a, < Pi < 1 for all i = 1, . . . , n. To define 
the next change of coordinates we change indexing of the non-central moments in 
such a way that A Q for a 6 {0, 1}" is denoted by A^ for A C [n] where i € A if and 
only if en — 1. The linearity of the expectation implies that the central moments 
can be expressed in terms of the non-central moments. We have 

(18) (i I = J2(-l)W\ AJ l[\ i for IC[n], \I\>2. 

jci ieJ 

Moreover, there is an algebraic isomorphism between the non-central moments A/ 
for all non-empty / C [n] and all the means A^ = EXi supplemented with all the 
central moments fij = K (Y[ ie j(Xi — A^)) for / G M>2j where [n]>j. denotes all 
subsets of [n] with at least k elements (see Appendix A in [25]). In particular we 
obtain in this way a change of coordinates from the non-central moments A/ for 
J C [n] to the central moments supplemented by the means. 

To specify the last change of coordinates we need some basic combinatorics. We 
define a partially ordered set (or poset) IIt of all partitions of the set of leaves [n] 
obtained by removing some inner edges of T and considering connected components 
of the resulting forest. The elements of Tlx are partitions ir = Bi \ ■■■ \Bf. where 
each Bi is called a block of n. The ordering on this poset is induced from the 
ordering on the poset of all partitions on [n] (see Example 3.1.1.d, [ ]). Thus for 
7r = B\ \ ■ ■ ■ \Bk and v — C\ \ ■ ■ ■ \Ci we write 7r < v if and only if every block of 7r is 
contained in one of the blocks of v. 

For any poset II we define its Mobius function m : II x II — >• R (c.f. [20, Chapter 
3]) by setting 

m(7r,7r) = 1 for every 7r £ II and m(7r, v) = — m/(7r,5). 

7r<<5<!/ 

For any / C [n] we define T(I) as the minimal subtree of T containing / in its 
vertex set. By nt/ we denote the Mobius function of T\-t(I) an d by 1/ the maximal 
one-block partition of Ht(i) ■ 
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The last system of coordinates is given by n means \ for i £ 1, . . . , n and ki for 
all I £ [n]> 2 , where 

(19) Kj — ^2 Wl(7T, ij) JJ M-B- 

7ren T (j-) bett 

We note that in particular Kij — fj>ij for all i,j£ [n]. This change of coordinates, 
from probabilities p a for a £ {0, 1}™ to (A$, kj) for i = 1, . . . , n and / e [n]>2 is 
denoted by f pK . It is an algebraic isomorphism given that ^ Q p a = 1 (see Appendix 
A, [25]). The new set of coordinates is called the system of tree cumulants. 

Example 4.3. Consider the quartet tree model, i.e. the hidden tree Markov model 
given by the graph in Figure 4. The tree cumulants are given by 15 coordinates: 




2 4 



Figure 4. A quartet tree 



Ai = EXi for % = 1,2,3,4 and kj for / £ [4]>2- We have = fiij = Cov(Xi, Xj) 
for 1 < i < j ; < 4 and = fiijk for all 1 < i < j < k < 4. However tree 
cumulants of higher order cannot be equated to corresponding central moments 
but only expressed as functions of them. Thus in this case by (19) 

K1234 = ^1234 — ^12/^34- 

The next step is to change the coordinates on the parameter space of the model. 
Define the following set of n v + n e parameters. For every directed edge (u, v) £ E 
let 

(20) v« = d i\l ~ 0$ and 

s v = 1 — 2X V for each v £ V, 

where A„ = EY V is a polynomial in the original parameters 9. Let (r, v±, . . . , Ufe, v) 
be a directed path in T. Then 

A„ = y e { ;, ] e {Vh } ■■■e£l 

v A-^i l|a* Qfc|ct fc _i a r 

a£{0,l} fc + 1 

We denote the new parameter space by fix and the coordinates by uj — ((s v ), (rj UtV )) 
for v £V, (u, v) £ E. 

Simple linear constraints defining Qt become only slightly more complicated 
when expressed in the new parameters. The choice of parameter values is not 
free anymore in the sense that constraints for each of the parameters involve other 
parameters. Qt is given by s r £ [—1, 1] and for each (u, v) £ E (c.f. Equation (19) 
in [25]) 

/ 21 i -(! + «!/) < (1-Su)»7u,w < 
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Since T is a tree then 2n e + 1 = n v + n e and hence dimf^ = dim0T- The change 
of parameters defined above is denoted by fg^ : 0y — > ^t- It is a polynomial 
isomorphism with the inverse denoted by f^g (see [25, Section 4]). Recall that for 
I C [n] by T(I) — (V(I),E(I)) we denote the subtree of T spanned on /. Let r(I) 
denote the root of T(I). Then for instance if T is the quartet tree in Figure 4 then 
for T(34): £(34) = {(a, 3), (a, 4)} and r(34) = a. The parameterization of _M T 
in the system of tree cumulants is given as a map tf>j> : Qt /p«(^2 n -i) by the 
following proposition. 

Proposition 4.4 (Proposition 4.1 in [ '■]). Let T = (V,E) be a rooted trivalent 
tree with n leaves. Then for each i = 1, . . . , n one has Aj = |(1 — Sj) and 

(22) = i(l - «? (J) ) [] s "° g ^ 2 II '»«.« ^roM/€[n]> ai 

veV(I)\I (u,v)£E(I) 

where the degree of v G V(7) is considered in the subtree T(I). 

We obtain the following diagram where the induced parameterisation i\)t is given 
in the bottom row. 

(23) 8 T >A 2 »_i 

f0ui fnp fpn 

Qrp — — — — — -> fcrp 

Let I denote the pullback of the ideal I C Ae T to the ideal in An T induced by fg u . 
Thus X — f* g Z = {f ° fud ■ f € I}. The ideal describes f2y = /^(©t) as a subset 
of fir- The pullback of I satisfies 

(24) T=(A 1 -A 1 ,...,A„-A n }+ ^ (k/(w) - fcj) J, 

\-f6[«]>2 / 

where A^ and kj are the corresponding coordinates of f pK (p)- Here the sum of ideals 
results in another ideal with the generating set which is the sum of generating sets 
of the summands. 

For local computations we use the following reduction. 

Proposition 4.5 (Proposition 4.6 in [ ]). Let I C A Xo (W n ), J C A yo (]R n ) &e two 
ideals. IfRLCT Xo (I) — (X x ,m x ) andKLCT yo (J) — (\ y ,m y ) then 

RLCT( a . 0)yo )(7 + J) = (A x + \ y ,m x + m y - 1). 

Theorem 4.6. Lei T be a rooted tree with n leaves andp £ Mt- Let I be the ideal 
defined by (15) and I the ideal defined by (24). Then 

(25) RLCT eT (I) = RLCT f2r (I) = mm RLCT 0o (Z), 

where fio is a sufficiently small neighborhood ofujQ in fix- Let J = X^e[n]> 2 ( K -f — 
ftj) then for every u>q £ Qt 

(26) RLCT U0 (Z) = (|,0) +RLCT W0 ( l 7). 
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Proof. Since f u $ is an isomorphism with a constant Jacobian then the first part of 
the theorem follows from Proposition 4.1 (i). 

Let W be an e-box around ujq- If T is rooted in an inner leaf then by Proposition 
4.4 the ideal J does not depend on si, . . . , s n . Since for every i = 1, . . . ,n the 
expression A.; — Xi depends only on Sj then 



which can be easily checked (see e.g. Proposition 3.3 in [17]). Equation (26) follows 
from Proposition 4.5. 

Now assume that T is rooted in one of the leaves. In this case both (Ai — 
Ai,...,A„ — A n ) and J depend on s r because rej(w) = (1 — s^)fi(ui) for some 
monomial fi(uj) whenever r € I. Therefore we cannot use Proposition 4.5 directly. 
However, by assumption (A2) s° g (—1,1) for i = 1, ...,n. Hence for each uiq 
and a sufficiently small e one can find two positive constants c(e),C(e) such that 
c(e) < 1 — s% < C(e) in W. By Proposition 4.1 (iv) the real log canonical threshold 
of J in W is equal to the real log-canonical threshold of a an ideal with generators 
induced from the generators of J by replacing each 1 — s 2 r by 1. Now again (26) 
follows from Proposition 4.5. □ 



In this section we show that the computations can be reduced to two main cases. 
First, when p is such that kij ^ for all i,j G [n]. Second, when p is such that 
kij = for all i, j e [n]. Moreover, the second case is reduced to computations for 
monomial ideals which are usually amenable to various combinatorial techniques. 

Let T be a trivalent tree with n > 3 leaves and let p € Mt- If all the equivalence 
classes in [E] are singletones or [E] is empty, which is equivalent to the fact that 
every inner node has degree at least two in T, then Theorem 1.1 gives us the 
asymptotic approximation for the marginal likelihood. Thus let assume that there 
is at least one nontrivial class in [E] . Let T± , . . . , TJ. denote trees representing the 
equivalence classes in [E] and let Si , . . . , S m denote trees induced by the connected 
components of E \ E. Let Lx, ■ ■ denote the sets of leaves of T\, . . . ,T/.. For 
each Si i = 1, . . . , m by Remark 5.2 (iv) in [25] its set of leaves denoted by [n.;] is a 
subset of [n]. We illustrate this notation using the graph below where the dashed 
edges represent edges in E. 



ss - Tl 

RLCT Wo ((Ai-A 1 ,...,A n -A n )) = (-,1), 



5. The main reduction step 
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Lemma 5.1. Let T = (V,E) be a trivalent rooted tree with n > 4 leaves and let 
peM T - Let J = Y^i e { n ] > v{ K l{u) - hi). If lo G O t then 

m k 

(27) RLCT Wo (J) =^RLCT Wo ( l 7(5 i )) + X] RLCT «o(^( T i)) + (0,1-m - k), 

i=\ i=l 

where J (Si) = Z)j e [ ni ]> 2 («z(w)-K.r) f ori = l,...,m andJ(T % ) = E„,^ e i i (' t ™'( IJ )) 
fori = l,...,k. 

Proof. We first show that ^ j .- /=0 (k/(w)) = ^ . =0 (Kjj . The inclusion "D" 
is clear. We now show "C" . First note that for every / e W>2 if /tj = then either 
r?g = for an edge e G £-(-0 or sj5, 7 n = 1. It is easy to check there exist i,j e I 
such that such that kij = and the r(ij) = r(I). It follows by Proposition 4.4 that 
ki(lu) = Kij(u))f(uj) for a polynomial f(u>) and therefore the inclusion "C" is also 
true. This implies 

m 

j= E Ww)-*/)+ E M w )) = E^)+ E <*«M)- 

Iifcj^Q I:kj—0 i—l i,j:&ij=0 

Hence to proof the lemma it suffices to show that for every ujq G £It 

rn 

(28) BLCT Uo C£j(Si) + J2 

i—l i,j:kij—0 

is equal to the right hand side of (27). 

If e G £ \ £ then by definition there exist i,j £ [n] such that kij ^ and 
e G E(ij). Since by Proposition 4.4 kij — i]®f(uio) for a polynomial / then in 
particular rf e ^- 0. It follows that for a sufficiently small e for each E' C E\ E 
one can find positive constants c(e), C(e) such that c(e) < Yl e eE' We — ^( e ) holds 
in the e-box around uiq. Similarly if v ^ V (c.f. Section 3.2) then there exist 
positive constants d(e),D(e) such that d(e) < (1 — s^) < £)(e) in the e-box around 
ujq. It follows by Proposition 4.1 (iv) that in computations of the real log-canonical 
threshold in (28) each Kij(w) can be replaced by 

(29) (l-s 2 rm ) 5 ^ J] n e 

e£E(ij)nE 

where 5 r (ij) = 1 if r(ij) € V and 5 r (y) = otherwise. Thus in (28) we can replace 
the ideal £ iJ:ft .. =0 </%- bytheidealJi = £. iJ: Ai3=0 ((l-s' fe) ) 5 "<-> n eG B(y)nE 
However, if we define 

(30) ^=e e <( i -^(w)) W) n 

i=l w,w'ELi e£E(ww') 

then it can be checked that J\ = Ji- To show this fix j = 1, . . . , k and w,w' € Lj. 
Note that by construction each oiw,w' either has degree two in T or is a leaf of T. 
Hence by the definition of E there exist i,j € [n] such that E(ij) f)E = E(ww'). 
It follows that each generator in (30) is also in the set of generators of J\ and 
hence C To show the opposite inclusion note that if E(ij) intersects with 
more than one component Tj, . . . , Tj. then the corresponding generator in (29) is a 
product of some generators in (30) and hence it lies in J 2 - 
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Since the generators of every J(Si) for i = 1, . . . , m and every 

e <(i-^(w)) W) n 

w,w'£Lj eeE(ww') 

for j = 1, . . . , k involve disjoint sets of variables then by Proposition 4.5 the term 
in (28) is equal to 

m k 

ERLCT Wo ( 1 7(5 4 ))+E RLCT -o(E <( 1 - s r(w/ KW) II »7e»+(0,l-m-*). 

1=1 1=1 w,w'£Lj e£E(ww') 

It can be easily checked that by Proposition 4.1 (iv) for each i = 1, . . . , k 
RLCT W0 (^ ((i _ JJ 7 ?e ))=RLCT W0 (^(T l )) 

w,w'£Li eEE(ww') 

which finishes the proof. □ 

6. The case of zero covariances 

In this subsection we assume that k^ = for all i,j € [n]. The aim is to prove 
the following proposition. 

Proposition 6.1. Let T be a trivalent tree with n^3 leaves rooted in r € V . Let 
p € M.t be such that kij — for all i,j € [n]. Let J — Y^i je[n]( K v( 0J )} ■ Then 

n 

min RLCT U0 (,7) = (-,m), 

where m — 1 if either r is a leaf of T or r together with all its neighbors are all 
inner nodes of T . In all other cases we cannot obtain an explicit bound for m and 
hence m > 1. 

There is no coincidence in the fact that J here denotes je[n] ana - m 

Section 5 it denotes X)/£[ri]> 2 ( K i — ^-f)- ^ n f ac t if ^ij = f° r all *>i G [ n ] then 
these two ideals are equal (see the beginning of the proof of Lemma 5.1). 

The strategy of the proof of Proposition 6.1 is as follows. First in Section 6.1 we 
show that the local computations can be restricted to a special subset of fl? over 
which J can be replaced by a monomial ideal. Then in Section 6.2 we present the 
method to compute real log-canonical threshold of a monomial ideal. We use this 
method in Section 6.3. 

6.1. The deepest singularity. First we note that the ideal J in Proposition 
6.1 depends on s v for v € V only through the value of s^. It follows that the 
computations can be reduced only to points satisfying s v > for all inner nodes v 
of T. Henceforth in this section we always assume this is the case. We define the 
deepest singularity of fix as 

(31) fideep := {oj efl T ■ Ve = for all e G E, s v = 1 for all v G V}. 

We note that since k^ — for all i,j£ [n] then E = E and V is equal to the set 
of all inner nodes of T and Jldcep is an affine subspace constrained to Ht- 



APPROXIMATION OF THE MARGINAL LIKELIHOOD FOR TREE MODELS 



19 



Proposition 6.2. Let T be a tree with n leaves. Let p £ M.t such that kij = for 
all i,j£ [n]. Then 

(32) min BLCT Uo (J) = min RLCT Uo (J). 

Proof. We build on the proof of Theorem 5.8 in [25]. We first show that f2-r is a 
union of affine subspaces constrained to Qt with a common intersection given by 
fidccp- Let Vb C V and Eq C E and 

(33) ^(v ,e ) = G T : s v = 1 for all v £ V , r) u , v = for all (u,v) £ E Q }. 

We say that (Vo, Eq) is minimal for £ if for every point lo in Q(v ,E ) an d for every 
i, j £ [n] Kij(uj) — and furthermore that (Vo,-Eo) is minimal with such a property 
(with respect to inclusion on both coordinates). We now show that the p-fibcr 
satisfies 

(34) f>T= (J fWo)- 

(V ,E ) min. 

The first inclusion "C" follows from the fact that if uj £ Sly then = kij = 

for all i,j € [n]. Therefore w £ Q(v ,E a ) f° r some minimal (Vo,-Eo)- The second 
inclusion is obvious. 

Each £l(y ,E ) is a union of 2' v °\ an affine subspace in rI^I+I-®!, denoted by 
M(v 0t E -), constrained to SIt- Let S denote the intersection lattice of all M^ Vo ,e ) 
for (Vq,Eq) minimal (c.f. Section 3.1 in [7]) with ordering denoted by <. For each 
i £ S let AfW denote the corresponding intersection and define 

(35) Si = \\J M<< j \ 

j<i 

In this way we obtain an S-induced decomposition of Rl^l + I- E l. 

By [ , Example 9.3.17] the function to t— > rlct ti ,(/) is lower semicontinuous (the 
argument used there works over the real numbers) . This means that for every ljq £ 
and e > there exists a neighborhood U of ljq such that rlct Wo (/) < rlct w (/) +e 
for all cj £ U. Since the set of values of the real log-canonical threshold is discrete 
this means that for every lo £ fix and any sufficiently small neighborhood Wo of 
uj one has rlct Wo (/) < rlct ClJ (/) for all u £ Wq. Since for any neighborhood Wq of 
luq £ f^dccp we have Wo n Si ^ for all i £ S then necessarily the minimum of the 
real log-canonical threshold is attained for a point from the deepest singularity. □ 

Proposition 6.2 shows that in the singular case we can restrict our analysis to 
the neighborhood of fideep- Often however we also consider points in a bigger set 

Qq = {lo £ VLt ■ Vu,v — for all (u, v) £ E}. 

Note that fideep lies on the boundary of fix but f2o also contains internal points of 
Qt which will be crucial for some of the arguments later. 

Lemma 6.3. Assume that p £ AAt is such that kij = for all i,j £ [n]. Let 
3 = Yli je[n]( K v( 0J )) an d ^(^o) be the ideal J translated to the origin. Then for 
every luq £ fio 

(36) RLCTo(JM) = RLCToGT"), 
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where J' is a monomial ideal such that each Kij(ui + uJq) in the set of generators of 
JT(wo) is replaced either by 

•Vfe) U(u,v)eE(ij) Vu,v if s° (ij) = 1, or by 

Yl(u,v)eE(ij) Vu,v if s r (ij) 7^ I' 

Proof. Let i,j G [n] and assume that loq € fio then by Proposition 4.4 

Kij{w +uq) = -{1- (s r(ii ) + s° r(lf) f) J[ rj e . 

e&E{ij) 

If s®,^ 7^ 1 for a sufficiently small e > there exist positive constants c(e), C(e) 
such that c(e) < 1 - (s r (y) + s"^-,) 2 < C(e) for s r (y) G (— e, e). Therefore by 
Proposition 4.1 (iv) we can replace this term in (36) with 1. If s®(ij) ~ ^ rewrite 
1 — (1 + s r(ij)) 2 as — s r(ij)(2 + s r (ij)). For a sufficiently small e we can find two 
positive constants c(e), C(e) such that c < 2 + s r (jj) < C whenever s r (y) £ (— e, e). 
Again by Proposition 4.1 (iv) we can replace this term with 1. This proves equation 
(36). ' □ 

Since J' is a monomial ideal then by Corollary 5.3 in [11] we can compute 
RLCT ( l 7 / ) using the method of Newton diagrams. We present this method in the 
following subsection. 

6.2. Newton diagram method. Given an analytic function / e ^4o we pick 
local coordinates x = (x\, . . . , Xd) in a neighborhood of the origin. This allows 
us to represent / as a power series in ii, . . . , xj, such that f(x) — c a x a . The 
exponents of terms of the polynomial / are vectors in N d . The Newton polyhedron 
of / denoted by T+(/) is the convex hull of the subset 

{a + a' : c a ^0,a' G K> }. 

A subset 7 C T+(/) is a face of r + (/) if there exists f3 EW. d such that 

7 = {«£ T+(/) : (a, P) < (a', p) for all a* G r+(/)}. 

If 7 is a subset of r+(/) then we define fj(x) = X^a^nN 1 * c aX a . The principal part 
of / is by definition the sum of all terms of / supported on all compact faces of 

r+(/). 

Example 6.4. Let f(x,y) = x 3 + 2xy + 6x 2 y + 3x 4 y + y 2 . Then the Newton 
diagram looks as follows 
4 
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where the dots correspond to the terms of /. There are only two bounded facets of 
r + (/) and the principal part of / is equal to f{x, y) = x 3 + xy + y 2 . 

Definition 6.5. The principal part of the power series / with real coefficients is 
M-nondegenerate if for all compact faces 7 of r +(/) 

(37) Ll": ^(x) = --- = ^(x)=o\ C{u6l":ii-% = 0}, 
^ OX\ ox n J 

From the geometric point of view this condition means that the singular locus of 
the hypersurface defined by f 1 (x) = lies outside of (R*) n for all compact faces 7 

ofr + (/). 

The following theorem shows that if the principal part of / is E-nondegenerate 
and / G .4§ it greatly facilitates the computations in Theorem 2.3. An example of 
an application of these methods in statistical analysis can be found in [23]. 

Theorem 6.6 (Theorem 5.6, [ 1]). Let f G (R d ) and /(0) = 0. If the principal 
part of f is WL-nondegenerate then RLCT (/) = (|, c) where t is the smallest number 
such that the vector (t,...,t) hits the polyhedron T + (f) and c is the codimension 
of the face it hits. 

Let now / G A§ such that f(0o) = 0. We can then center / at 0q obtaining 

a function in A^. If / is nonnegative then we can use Theorem 6.6 to compute 
RLCTg (/). Note that this theorem in general will not give us RLCTe (/) if is 
a boundary point of Q. For a discussion see [1, Section 8.3.4] and Example 2.7 in 



6.3. Proof of Proposition 6.1. Let n > 4. For each loq G f^o let 6 = S(ojq) G 
{0, 1} V denote the indicator vector satisfying S v = 1 if s° = 1 and 5 V = otherwise. 
In particular 5i = for all i = 1, ...,n because the leaves are assumed to be 
non-degenerate. Let V$ = M nc +I <5 l = M) s \ x W 1 " be a real space with variables 
representing the edges (x e ) eG E and nodes (y v ) for all v such that S v — 1. With 
some arbitrary numbering of the nodes and edges we order the variables as follows: 
2/1 ~< ••• ~< 2/151 -< Xi -<•■■< Xn e - In Lemma 6.3 for each ujq G f2o we reduced our 
computations to the analysis of RLCTo(J r ') where J' has a simple monomial form. 
Let Qs be a polynomial on fix defined as a sum of squares of generators of J' . In 
particular RLCTo(»7') = RLCTo(Q,5). The exponents of terms of the polynomial 
Qs(w) are vectors in {0, 2}""+l ,5 l . We have that 



(38) Qs(u) = £ J] <* 



28 r 

i,je[n] (u,v)eE(ij) 



If / is a polynomial then the convex hull of the exponents of the terms in the sum 
is called the Newton polytope and denoted T(/). Since each term of Qs corresponds 
to a path between two leaves then the construction of the Newton polytope T(Qg) C 
Vg gives a direct relationship between paths in T and points generating the polytope. 
Convex combinations of points corresponding to paths give rise to points in the 
polytope. Let E C E be the subset of edges of T such that one of the ends is in 
the set of leaves of T. We call these edges terminal. Note that each point generating 
T(Qg) satisfies J2 e eE x e — 4- This follows from the fact that each of these points 
corresponds to a path between two leaves in T and every such a path need to cross 
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exactly two terminal edges. Consequently each point of T(Qs) needs to satisfy this 
equation as well. The induced facet of the Newton polyhedron T + (Qg) is given as 

(39) f = {(y,x) er+(Q 5 ) : ]T x e = 4} 

eG-Eo 

and each point of T + (Qg) satisfies ^2 eeE() x e > 4. 

The following lemma proves a part of Proposition 6.1. 

Lemma 6.7 (The real log-canonical threshold of I). Let T be a trivalent tree with 
n leaves where n ^ 3. If ujq £ then rclto(iJ') = j. 

Proof. If n = 2 then since Sj 1 , s§ 7^ 1 by Lemma 6.3 we have that 

RLCT W0 (J) = RLCT W0 (( Kl2 H)) = KLCT fa? 2 ) = (i 1). 

Therefore Proposition 6.1 holds in this case. Now assume that n > 4. By Theorem 
6.6 we have to show that t = 4/n is the smallest t such that the vector (t,...,t) 
hits T+(Qs). To show that 4/nl G r+(Q,5) we construct a point q € r(Q^) such 
that g < 4/nl coordinatewise. The point is constructed as follows. 

Construction 6.8. Let T = (V, E) be a trivalent rooted tree with n > 4 leaves. 
We present two constructions of networks of paths between the leaves of T. 

The first construction is for the case when 8 r = 1. In this case in particular T is 
rooted in an inner node. If n = 4 then the network consists of the two paths within 
cherries counted with multiplicity two. 




Each of the paths corresponds to a point in T(Qs). We order the coordinates 
of Vg = M 5+ '' 5 ' by y a -< yb -< x\ -< ■ ■ ■ -< £5 where y a ,Vb are included only if 
i„, 6b = 1. For example the point corresponding to the path involving edges e\ and 
e 2 is (2, 0; 2, 2, 0, 0, 0). The barycenter of the points corresponding to all the four 
paths in the network is (1, 1; 1, 1, 0, 1, 1) both if T is rooted in a or b. 

If n > 4 then we build the network recursively. Assume that T is rooted in an 
inner node a and pick an inner edge (a,b). Label the edges incident with a and b 
as for the quartet tree above and consider the subtree given by the quartet tree. 
Draw four paths as on the picture above. Let v be any leaf of the quartet subtree 
which is not a leaf of T and label the two additional edges incident with v by e§ 
and ej. Now we extend the network by adding e§ to one of the paths terminating 
in v and e-j to the other. Next we add an additional path involving only e§ and ej 
like on the picture below. By construction v is the root of the additional path. We 
extend the network cherry by cherry until it covers all terminal edges. 




APPROXIMATION OF THE MARGINAL LIKELIHOOD FOR TREE MODELS 23 

Note that we have made some choices building up the network and hence the 
construction is not unique. However, each of the inner nodes is always a root of 
at least one and at most two paths. Moreover, each edge is covered at most twice 
and each terminating edge is covered exactly two times. We have n paths in the 
network, all representing points of r(Q,5) denoted by q±, . . . , q n . Let q = ^ Yl7=i 9« 
then q G T(Qs) is given by x a b = 0, x e = 4/n for all e £ E \ (a,b). The other 
coordinates by construction satisfy y a = 4/n, yi, — 4/n if 5i, = 1, and y v = 2/n for 
all v € V \ {a, b} such that S v = 1. 

If S r — then we proceed as follows. For n = 4 consider a network of all the 
possible paths all counted with multiplicity one apart from the cherry paths (paths 
of length two) counted with multiplicity two. This makes eight paths and each 
edge is covered exactly four times. With the order of the coordinates as above the 
coordinates of the point representing the barycenter of all paths in the network 
satisfy x e = 1 for all e G E and y v = 1/2 for all v such that 5^ = 1. This 
construction generalizes recursively in a similar way as the one for T rooted in an 
inner node. We always have 2n paths and each edge is covered exactly four times. 
The network induces a point q € T(Q$) with coordinates given by y v — 2/n for all 
v 6 V such that S v = 1 and x e — 4/n for e € E. (This finishes the construction.) 

The point 4/nl lies in T+(Qs). This follows from Construction 6.8 and the 
fact that the constructed point q € F(Qg) satisfies q < — 1. Moreover, for any 
s < 4/n the point s(l, . . . , 1) does not satisfy X) e gs x e — 4 and hence it cannot 
be in T + (Qs). It follows that 4/n is the smallest t such that tl E T + (Qs) and 
therefore rlct (J _/ ) = n/4. We note that the result does not depend on 5. 

□ 

To compute the multiplicity of the real log-canonical threshold of Qs we have to 
get a better understanding of the polyhedron r + (Q,5). According to Theorem 6.6 
we need to find the codimension of the face of T + (Qs) hit by 4/nl. First we find 
the hyperplane representation of the Newton polytope T(Qs) reducing the problem 
to a simpler but equivalent one. 

Definition 6.9 (A pair-edge incidence polytope). Let T = (V,E) be a trivalent 
tree with n > 4 leaves. We define a polytope P n C W 1 " , where n e = 2n — 3, as a 
convex combination of points (Qij)i,je.[ n ] where fc-th coordinate of q^ is one if the 
fc-th edge is in the path between i and j and there is zero otherwise. We call P n a 
pair- edge incidence polytope by analogy to the pair-edge incidence matrix defined 
by Mihaescu and Pachter [12, Definition 1]. 

The reason to study the pair-edge incidence polytope is that its structure can 
be handled easily and this can be shown to be affinely equivalent to T(Qs). This 
is immediate if 8 = (0, . . . , 0) since Qo = 2P n . For an arbitrary 5 fix a rooting r of 
T and define a linear map f r : K™ e — > K' 5 ' as follows. For each v € V\r such that 
S v = 1 set 

where chi(u), ch 2 (v) denotes the two children of v. If S r = 1 then set 

Vr l/2(^rchi(r) "i" 3Vch 2 (r) *^rch 3 (r))- 

It can be easily checked that for a map (id x f r ) : R n ' — > W 1 ' x M.^ one has 
(id x f r )(2P n ) — T(Qs). This follows from the fact that for each point y r = 2 if 
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and only if the path crosses r and for any other node y v = 2 if and only if the path 
crosses v and v is the root of the path, i.e. if the path crosses both children of v. 

Lemma 6.10. Let P n C W 1 " be a pair-edge incidence polytope for a trivalent tree 
with n leaves where n > 4. Then dim(P„) = n e — 1 = 2n— 4. Hence the codimension 
°f Pn is one. The unique equation defining P n is ^2 eeE x e — 2. For each inner 
node v € V let e±(v), e-i{v), e^[v) denote the three adjacent edges. Then exactly 
3{n — 2) facets define P n and they are given by 

(40) x ei{v) + £E e2 („) - x Bs ( v ) > 0, x e2(v ) + x e3(v) - x ei(v) > 0, 
and x es t v ) + x ei t v ) — x e2 i v ) > for all v € V. 

Proof. Let M n be the pair-edge incidence matrix, i.e. a (™) X n e matrix with 
rows corresponding to the points defining P n . By Lemma 1 in [ ] the matrix has 
full rank and hence P n has codimension one in W 1 '. Moreover since each path 
necessarily crosses two terminal edges then each point generating P n satisfies the 
equation ^2 e€Eg x e = 2 and hence this is the equation defining the affine subspace 
containing P n . 

Now we show that the inequalities give a valid facet description for P n . This can 
be checked directly for n = 4 (e.g. using Polymake [ ]). Assume this is true for all 
k < n. By Q n we will define the polytope defined by the equation J2eeE„ x e = 2 
and 3(n — 2) inequalities given by (40). It is obvious that P n C Q n since all points 
generating P n satisfy the equation and the inequalities. We show that the opposite 
inclusion also holds. 

Consider any cherry {ei,e2} C E in the tree given by two leaves denoted by 
1, 2 and the separating inner node a. Note that the inequalities in (40) imply in 
particular that x e > for all e G E. Define a projection tt : MJ 1 ' — > R" c ~ 2 on 
the coordinates related to all the edges apart from the two in the cherry. We have 
""(Qn) — Qn-i, where P = conv{0, P} is a cone with the base given by P. The 
projection Tr(Q n ) is described by all the triples of inequalities for all the inner nodes 
apart from the one incident with the cherry and the defining equation becomes an 
inequality 

eeE \{ei,e 2 } 

Denote the edge incident with ei,e2 by e% and the related coordinates of x by 
X\,X2,X3. The three inequalities involving x\ and x-i do not affect the projection 
since they imply that 

max{xi — X2, X2 — xi} < x% < x\ + X2 

and hence in particular if xi = X2 the constraint becomes [0, 2x\]. Consequently 
the set given by x\ + X2 — x% > 0, x\ + x% — X2 > 0, X2 + x^ — x\ > projects down 
to K>o- However since Q n -\ is contained in the nonnegative orthant there are no 
additional constraints on X3. Inequalities in Equation (40) define a polyhedral cone 
and the equation X) e e_B \{ei e 2 } x e = t for t > cuts out a bounded slice of the 
cone which is equal to t ■ P n -i- The sum of all these for t £ [0, 2] is exactly Q n -i- 
Since Q n -\ — Pn-i by induction then each w(x) is a convex combination of the 
points generating P n -\ and zero, i.e. n(x) = J2 c ijPij where the sum is over all 
i^je{a,3,...,u} and > 0, Y, c ij < 1 - 
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Let x £ Q n . Since tt(x) £ P n -i we can write it as the linear combination above. 
Next we lift this combination back to Q n and show that any such a lift has to lie 
in P n . This would imply that in particular x £ P n . Let y denote a lift of n(x) to 
Q n . We have 

y = J2 c ^ r v + i 1 ~ E Ci j) r °' 
where is a lift of ir(pij) and ro is a lift of the origin. It suffices to show that each 
Tij and ro necessarily lie in P n . 

Consider the following three cases. First if pij £ P n -i is such that 23 = 
then sum of all the other coordinates related to the terminal edges is two since 
Pn-i = Qn-i and Q n -i satisfy the equation X] e G-E \{ei e 2 } Xe ~ ^ - Hence if we lift 
7r(py) to Q n then 23 — and 

X\ + x 2 > 0, x\ — X2 > 0, X2 — x\ > 

by plugging 23 = into the three inequalities for the node a. But since nj must 
also satisfy the equation ^2 eeE x e — 2 and since we already have 

e£Eo\{ei,e 2 } 

then 21+22 = and hence 21 = 22 = 0. Consequently, is a vertex of P„. 
Second if p^j is a vertex of P n —\ such that 23 = 1 then the sum of all the other 
coordinates of related to the terminal edges is one and hence since the lift is in 
Q n we have 21 + 22 = 1. The additional inequalities give that 21,22 > 0. Hence 
in this case is a convex combination of two points in P n - one corresponding 
to a path finishing in one of the edges and the other in the other. Finally, we can 
easily check that zero lifts uniquely to a point in P n corresponding to the path 
£7(12). Indeed, from the equation defining Q n we have 21 + 22 = 2 and from the 
inequalities since 23 = we have 21 = 22 = 1. Therefore every lift y of ir(x) to Q n 
can be written as a convex combination of points generating P n and hence y £ P n . 
Consequently x £ P n and hence Q n CP n . □ 

Lemma 6.10 shows that P n has an extremely simple structure. The inequalities 
give a polyhedral cone and the equation cuts out the polytope P n as a slice of this 
cone. The result gives us also the representation of T(Qs) in terms of the defining 
equations and inequalities. 

Proposition 6.11 (Structure of T(Qs)). Polytope T(Qg) C V$ is given as an 
intersection of the sets defined by the inequalities in (40) together with \8\ + 1 
equations given by 

2y v = 2„ chl („) + X vdl2 ( v ) - 2 pa („)„ for all v ^ r such that 5 V = 1, 
(41) 2y r = 2 rchl(r ) + 2 rch2(r ) + av ch3 (r) if S T = 1, and 

^2eeE Xe = 4- 

From this we can partially understand the structure of T + (Q$). First note that 
r + (/) = r(/) +M> , where the plus denotes the Minkowski sum. The Minkowski 
sum of two polyhedra is by definition 

ri + r 2 = {2 + y £ R d : xer^ye r 2 }. 

Lemma 6.12. Let T c M>o be a polytope and let T + be the Minkowski sum of V 
and the standard cone M> - Then all the facets ofT + are of the form £\ a i x i ^ c 
where > and c > 0. 
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Now we are ready to compute multiplicities of the real log-canonical threshold 
KLCTf)(Qs). This completes the proof of Proposition 6.1. 

Lemma 6.13 (Computing multiplicities). Let T be a trivalent tree with n > 4 
leaves and rooted in r. Let p G A4t be such that kij = for all i,j G [n] and 
ujq G flo- Let 8 = S(u]f)) be such that 5 V = 1 if s® — 1 and it is zero otherwise. 
Define Qs(lu) as in (38). If either 8 r = or 8 r = 1 and 8 V = 1 for all (r, v) G E 
then multo(Q,5) = 1. 

Proof. A standard result for Minkowski sums says that each face of a Minkowski 
sum of two polyhedra can be decomposed as a sum of two faces of the summands 
and this decomposition is unique. Each facet of T + (Q$) is decomposed as a face 

of the standard cone K™^ -1 "' 5 ' C Vg plus a face of T(Q$). We say that a face of 

T(Qs) induces a facet of T + (Q$) if there exists a face of the standard cone R™ e 
such that the Minkowski sum of these two faces gives a facet of T + (Qs). However, 
since the dimension T(Q$) is lower than the dimension of the resulting polyhedron 
it turns out that one face of T(Q$) can induce more than one facet of T + (Q$). In 
particular T(Qs) itself induces more than one facet and one of them is Fq given by 



Every facet of T+(Q$) containing 4/nl after normalizing the coefficients to sum 
to n, i.e. a v + ^ e /3 e = n, is of the form 



where by Lemma 6.12 we can assume that a v ,/3 e > 0. 

Our approach can be summarized as follows. Using Construction 6.8 we provide 
coordinates of a point q G T(Q$) such that 4/nl lies on the boundary of q + M^p -1 "'' 5 ' . 
Then 4/nl can only lie on faces of T + (Q$) induced by faces of T(Qs) containing q. 

First, assume that 8 r — which corresponds to the case when the root r repre- 
sents a non-degenerate random variable. Consider the point q S T(Qs) induced by 
the network of 2n paths given in Construction 6.8. Since x e = 4/n for all e G E 
then from the description of T(Qg) in Lemma 6.11 we can check that all defining 
inequalities are strict for this point. Therefore q lies in the interior of T(Qg). There- 
fore, the only facets of T + (Qs) containing q are these induced by T(Qs) itself. The 
equation defining a facet induced by T(Qg) has to be obtained as a combination of 
the defining equations: YleeE x e — 4 and \8\ equations 

for all v G V such that 8 V = 1. We check possible combinations such that the form 
of the induced inequality as given in (42) is attained. The first inequality, defining 
Fq, is already of this form (c.f. equation (39)). The sum of all the coefficients is 
n since there are n terminal edges. Any other facet has to be obtained by adding 
to the first equation (since the right hand side in (42) is 4) a non-negative (since 
the coefficients in front of y v need to be non-negative) combination of equations in 
(43). However, since the sum of the coefficients in (43) is +1 this contradicts the 
assumption that the sum of coefficients in the defining inequality is n. Consequently, 
if 8 r = the codimension of the face hit by 4/nl is 1 and hence by Theorem 6.6 
we have that rmilt (Qs) = 1. 



(39). 



(42) 




V 



e 
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Second, if S r = 1 and S v = 1 for all children of r in T then since all the nodes 
adjacent to r (denote them by a, b, c) are inner we have three different ways of 
conducting the construction of the n-path network in Lemma 6.8 (by omitting each 
of the incident edges). Hence we get three different points and their barycenter 
satisfies x ra — x r b = x rc = 8/3n and x e — A/n for all the other edges; y r = A/n, 
Va = Vb = Uc = 8/3n and y v = 2/n for all the other inner nodes. Denote this 
point by q. By the facet description of T(Qg) derived in Proposition 6.11 we can 
check that this point cannot lie in any of the facets defining T(Q$) and hence it is 
an interior point of the polytope. As in the first case it means that the facets of 
T+(Qs) containing q are induced by T(Q$). By Proposition 6.11 the affine span is 
given by the equation defining Fq, the equations (43) for all inner edges v apart 
from the root and in addition for the root we have 

(44) 2?/ r *^ra %rb <Krc 0- 

Since the sum of coefficients in the above equation is negative we cannot use the 
same argument as in the first case. Instead we add to J2 e eE Xe = ^ a non - ne gative 
combination of equations in (43) each with coefficient t v > and then add (44) with 
coefficient The sum °f coefficients in the resulting equation will be n by 

construction. The coefficient of x ra is t a — J2 v ^r Since it has to be non- negative 
it follows that t v = for all v apart from a. However, by checking the coefficient of 
x r f, one deduces that t v = for all inner nodes v. Consequently the only possible 
facet of T + (Qg) containing A/nl is F and hence again multo(Qi) = 1. 

□ 

7. Proof of Theorem 1.2 

In this section we complete the proof of Theorem 1.2. We split the proof into 
three steps. 

Step 1. To approximate log Z(N) it suffices to approximate log I(N), where I(N) 
is given by (11) because logZ(iV) = In +log/(JV). By Theorem 2.3 equivalently 
we can compute RLCTe T (/; ^p), where / is the normalized log-likelihood and (p is 
the prior distribution satisfying (Al). By Theorem 4.2 and Theorem 4.6 this real 
log-canonical threshold is equal to RLCTn T (I), where I is the ideal given by (24). 
Step 2. We compute separately RLCTn T (X) in the case when n = 3. If T is rooted 
in the inner node the approximation for log Z(N) follows from Theorem 4 in [16]. 
Thus if E = E, which in [16] corresponds to the type 2 singularity, then 

(45) logZ(iV)=i iV -21ogiV + 0(l) or RLCT 0t (1) = (2, 1). 

Since all the neighbours of the root are leaves and hence by (A2) they are non- 
degenerate we need only to make sure that the first equation in Theorem 1.2 gives 
(45). This follows from the fact that I2 — and 1$ = 0. In the case when \E\ = 1 
(type 1 singularity) we have 

logZ(A0 =h~\ log N + 0(1) or RLCTfi r (1) = (|, 1). 

The second equation in Theorem 1.2 holds since l 2 = 1, Z3 = and c = 0. If E = 
we have 

\ogZ(N)=i N ~ 7 -\ogN + 0{1) or RLCTn^) = 1), 
which again is true since 1% = 0, I3 = 1 and c = 0. 
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Now assume that T is rooted in a leaf, say 1. If there exists i,j — 1,2,3 such 
that kij j= (or \E\ < 1) then V = and by Proposition 3.2 

IogZ(JV)=i JV --!- T ^ + 0(l) or RLCT nr (I) = (^-Al). 

li E = E then by Theorem 4.6 for every wo € 

RLCT Wo (:T) = (|,0) +RLCT W0 ( l 7). 

Moreover, by Lemma 6.3, for every ojq £ ^0 

RLCT Wo (,7) = RLCT ((r?i )ft ?7ft i 2,»?i,fe%,3» s h,' , %,2 r ?'!-,3}), 

where <5^ = 1 if s° = 1 and <5/j = 1 otherwise. It can be checked directly by using 
the Newton diagram method and Theorem 6.6 that RLCT td0 (J r ) = (|, 1) both if 
S h = and 5 h = 1 and hence RLCT U() (I) = (|, 1). Since the points in fio such that 
s° 7^ 1 lie in the interior of Q T then for these points RLCT Wo (Z) = RLCTq (I) 
where fio is a neighborhood of wo in J7t- Hence by (8) we have that 

RLCTsi T (I) = min RLCTq (I) < _min RLCT^(I) = (-, 1). 

LJotzO^ a;oGnonint(n7-) ^ 

On the other hand by (7) and then Proposition 6.2 

RLCTjj t (I) > min RLCT U0 (I) > min RLCT^T) = (-, 1). 

It follows that 

(46) logZ(TV) =£jv- ^logiV + 0(l) or RLCT 0t (Z) = (Jj, 1), 

which gives the the second equation in Theorem 1.2 since in this case 1% = h = c = 
0. 

Step 3, Case 1. Assume now that n > 4 and r ^ V. In this case every Ti for 
i = 1, . . . ,k is rooted in one of its leaves. Hence rlct^^C^)) = |Li|/4 for every 
i = 1, . . . k. If 7^ 3 this follows from Proposition 6.1. If = 3 it follows from 
Case 2 above. By Lemma 5.1 and Proposition 3.2, for every oj 6 O we have that 

(T s ,n m , y^ nj + ni- n.- -2% ^ \L t \ 
rlct^I) = (-,0) + 2^ r, + 2^ 

i=l i=l 

where n l v , n\, l\ are respectively the number of vertices, edges and and degree 
two nodes in T of St; and Li is the set of leaves of Tj. We use three simple 
formulas: ^ n l v = l x + Z 2 + £3 (Le. only degree zero nodes of T do not lie in 
the Si's), Y, t n l = \ E \E\ (i.e. E \ E is the set of all edges of all the Si's) and 
J2i \Li\ = l 2 +n — li (i.e. the leaves of all the Tj's are precisely the degree two nodes 
of T and these leaves of T which have degree zero) . Moreover for any graph with 
the vertex set V and the edge set E, Y^vev deg(u) = 2n e (see e.g. Corollary 1.2.2 
in [19]). Therefore with the formula applied for f we have h + 2l 2 + 3l 3 = 2\E \ E\. 
Using these four formulas we show that rlct Wo (Z) = |(3n+/! 2 + 5Z 3 ). Moreover, since 
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5 r = for all loq G Oq then by Lemma 6.13 multo(»7'(Tj)) = 1 for every ujq G ^o- 
Therefore, 

(47) RLCT Uo (I) = Q(3n + 1 2 + 5J 3 ), l) • 

Now we show that RLCTq t (I) also has the same form. Let uj 2 be a point in Q,q 
such that St, 7^ 1 for all v £ V and let wi G fldccp- Equation (47) is true both if 
luq = u>i and ujq — uj 2 and hence RLCT Wl (I) = RLCT W2 (I). However, since lj 2 is 
an inner point of fly it follows, from the definition of RLCTs7 r (T) as the minimum 
over all points in Qt, that 

RLCT^Z) <RLCT (Z W2 ). 
On the other hand by (7) and Proposition 6.2 

RLCT (X W1 )= min RLCT (Z Wo ) < mm RLCT Qo (I Uo ) =KLCT Qt (X). 

Therefore, if r $ V then in fact RLCTq t (1) = (|(3n + l 2 + 5Z 3 ), 1) and hence 

Z(N) = £n-\ (3n + Z 2 + 5Z 3 ) logiV + 0(1). 

Hence in this case the main formula in Theorem 1.2 is proved since c = 0. 
Step 3, Case 2. Let now n > 4 and r E V. Let 1 < j < k be such that r is 
an inner node of Tj and Wo G ^o- F° r au * 7^ 3 1\ is rooted in one of its leaves. 
Therefore, by Lemma 6.7, Lemma 6.13 and Step 2 above for all i ^ j we have that 
KLCT Uo (J(Ti)) = (|£i|/4, 1). It remains to compute RLCT Wo (J(Tj)). If \Lj\ = 3 
then RLCT Wo (J(Tj)) = (1/2, 1) = ((\Lj\- l)/4, 1) by the Step 2 above (c.f. (45)). 
Therefore in this case the computations are the same as in Step 3, Case 1 but with 
a difference of \ in the real log-canonical threshold. Therefore we obtain 

Z{N) = £ N - - (3n + l 2 + 5/ 3 - 1) log N + 0(1). 

However, if \Lj\ > 4 then by Lemma 6.7 rlcto( l /(T :) )) = \Lj\/4 and hence as in 
Step 3, Case 1 we have J2i=x r l c to(X, (^)) — \( n ~ h + h)- Therefore 

rlctn r (Z) = -[3n + l 2 + 5l 3 ). 

We compute the multiplicity by considering different subcases. If all the neighbours 
of r are degenerate then for all points ojq £ Hdeep we have that 6 r = 1 and 5 V = 1 
for all neighbours v or r. It follows from Lemma 6.13 that mult Wo (J r (T J )) = 1 and 
hence multn T (I) = 1. Therefore, 

Z(N)=£ N --(3n + l 2 + 5h)logN + 0(l). 

Otherwise we do not have explicit bounds on the multiplicity. Since multn T (X) > 1 
then 

Z(N)=£ N - i (3n + Z 2 + 5Z 3 )logiV + (m-l)loglog-/V + 0(l), 
where m > 1. This finishes the proof of Theorem 1.2. □ 
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